Similarity search optimization using recently-biased symbolic representation
نویسندگان
چکیده
Dimension reduction is one of the important requirements for a successful representation to improve the efficiency of extracting the attracting trend patterns on the time series. Furthermore, an efficient and accurate similarity searching on a huge time series data set is a crucial problem in data mining preprocessing. Symbolic representations have proven to be a very effective way to reduce the dimensionality of time series without loss of knowledge. However, symbolic representations suffer from another challenges promoted by the possibility of losing some principal patterns due to the impractical utilization of dealing with the whole data with the same weight. The methodology utilized in this paper is proposed to overcome symbolic representation pattern mismatch. Moreover, the data dimensionality is reduced by keeping more detail on recent-pattern data and less detail on older ones using modified sliding window controlled by the corresponding classification error rate. Experimental results were made on the UCR standard dataset comparing with the state of the art techniques. The proposed techniques showed promising results. Furthermore, practical experiments were made on the Egyptian stock market indices EGX 30, EGX 70 and EGX 100. The discovered patterns showed the accuracy and effectiveness of the proposed approach.
منابع مشابه
A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملEnhancing the Symbolic Aggregate Approximation Method Using Updated Lookup Tables
Similarity search in time series data mining is a problem that has attracted increasing attention recently. The high dimensionality and large volume of time series databases make sequential scanning inefficient to tackle this problem. There are many representation techniques that aim at reducing the dimensionality of time series so that the search can be handled faster at a lower dimensional sp...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملAn Empirical Study of Similarity Search in Stock Data
Using certain artificial intelligence techniques, stock data mining has given encouraging results in both trend analysis and similarity search. However, representing stock data effectively is a key issue in ensuring the success of a data mining process. In this paper, we aim to compare the performance of numeric and symbolic data representation of a stock dataset in terms of similarity search. ...
متن کاملAccurate Deep Representation Quantization with Gradient Snapping Layer for Similarity Search
Recent advance of large scale similarity search involves using deeply learned representations to improve the search accuracy and use vector quantization methods to increase the search speed. However, how to learn deep representations that strongly preserve similarities between data pairs and can be accurately quantized via vector quantization remains a challenging task. Existing methods simply ...
متن کامل